From splash to stats: Analysis of swimming performance with R

SEIO 2025

A. González Romero, C. Lancho Martín, Á. Novillo, V. Aceña, J. García-Ochoa, I. Martín de Diego

Data Science Laboratory, Universidad Rey Juan Carlos

Contents

  • Context
    • Data Science
    • Sport Analytics
    • Swimming
  • Static vs dynamic clustering analysis
  • App in RShiny

Data Science


               Foundations


               Applications

Sport Analytics: when sport meets data

  • The application of data science, statistics, and technology to analyze and improve athletic performance.
  • Gained momentum in the early 2000s with advances in tracking systems, video analysis, and machine learning.
  • Used in coaching, injury prevention, game strategy or talent identification.

Sport Analytics: when sport meets data

Famous examples:

  • Moneyball
  • NBA – player tracking, shot charts
  • Football – GPS-based workload monitoring
  • Running, cycling – biomechanical and physiological modeling

Swimming: precision, performance, and patterns

  • Swimming is highly measurable: distances, times, splits, stroke rates, turn times.
  • Small performance changes (e.g., 0.1s) can determine podiums — data granularity matters.
  • Technology enables detailed monitoring:
    • Wearable sensors
    • Underwater cameras
    • Biomechanical modeling
    • Lactate and heart rate tracking

Swimming

Student and domain expert: Alonso González Romero

Data

This work applies data science techniques to analyze performance in competitive swimming, using results from the 2024 World Swimming Championships (Budapest) in a 25-meter pool.

  • Focus: race times, splits, strokes, and event-level metrics
  • Context: elite international competition

Data source: Omega Timing
Omega is the official technical sponsor of international swimming events and provides high-precision race data.

Race pacing analysis



Race pacing analysis: Different patterns/strategies



⟶ Clustering

Approach: Static vs dynamic

Datasets can be categorized based on their temporal availability:

  • Static data: Fully available from the beginning of the study. These datasets can be explored, cleaned, and modeled comfortably.
  • Streaming data: Arrives continuously in real time.

Traditional clustering algorithms assume full access to the entire dataset from the start.

Streaming data introduces an implicit temporal dependency: observations at time \(t\) are often related to those at time \(t-1\), creating an evolving structure. Evolutionary clustering algorithms process data sequentially and incorporate past information to update cluster structures in real time.

Static methodology

Static methodology

  • Main goal: find groups according to different speed swimming patterns
  • Techniques:
    • \(k\)-means
    • \(k\)-medoids
    • Agglomerative hierarchical clustering
    • Similarity measures: Euclidean distance and Dynamic Time Warping

Static results

Static results

Dynamic methodology

Dynamic methodology

  • Main goal: to detect race breaks

  • Based on: EvolveCluster: an evolutionary clustering algorithm for streaming data (Nordahl et al. 2022)

  • Adapted to swimming context: observations = swimmer gaps over time

EvolveCluster: an evolutionary clustering algorithm

Algorithm EvolveCluster (Nordahl et al. 2022)

  • \(D\) is a continuous stream of data, segmented into time-based chunks \(D_0, D_1, \ldots, D_t, t\rightarrow \infty\).

  • \(D_0\) is partitioned in \(k\) clusters (via \(k\)-means): \(C_{0}=\{C_{00},\dots, C_{0k}\}\).

  • For each segment \(D_t, t\neq0\), \(k\)-means is initialized using centroids from \(C_{t-1}\)

  • Centroids of \(C_{t-1}\) are removed and empty clusters are deleted.

  • New centroides are calculated and the partition \(C_t\) is refined:

    • Any cluster should be split into two? Apply \(2\)-means for each cluster with the two furthest points as initial centroids: \(C'_t\)

    • The two options of partitions are evaluated by a validation measure (e.g. Silhouette index SI): \(\text{If } \text{SI}(C'_t) > \text{SI}(C_t) + \tau \Rightarrow C_t \leftarrow C'_t\)

EvolveCluster

Figure extracted from (Nordahl et al. 2022)

EvolveCluster adapted

Goal: Detect race breaks

Input: Time gap from race leader

  • Splits are evaluated within each cluster.
  • Observations (swimmers) are sorted by gap to leader.
  • A split occurs if two consecutive swimmers are separated by more than \(\tau\)

Refinement strategy:

Let \(x_i\) and \(x_{i+1}\) be consecutive swimmers in cluster \(C_l\). If \(d(x_i, x_{i+1}) > \tau\), split the cluster at that point.

\(\tau\) can also be selected following a validation measure (SI, Dunn index, etc.)

Advantages:

  • \(\tau\) is interpretable as the minimum meaningful gap between swimmers

Dynamic results: Detecting race breaks

From splash to stats - Dashboard

🖥️ Live demo 🤞🏻

Exploratory Data Analysis (EDA)

  • Age

Exploratory Data Analysis (EDA)

  • Age

EDA

  • Distribution of swimmers per country

EDA

  • Reaction time

EDA

  • Reaction time

EDA: Race evolution

⟶ Lack of dynamism

EDA: Dynamic race evolution

Conclusions and future work

What we learned

  • We explored both static and dynamic clustering approaches on competitive swimming data
  • From splits, we identified strategic groupings of swimmers based on how they paced their races
  • By analyzing gap-to-leader as streaming data, we captured race dynamics and key moments of separation
  • The combination of both views offers a richer understanding of performance patterns

Despite the limited data, our analysis provides:

  • Tactical insights into pacing strategies
  • Recommendations for swimmers and coaches
  • A framework for interpretable and adaptable analytics

Future Work

  • Improve and expand the interactive dashboard
    • Enhanced visualizations
    • Real-time insights during races
  • Explore and benchmark additional clustering algorithms
  • Collaborate with coaches to translate patterns into training recommendations (aspirational goal)

References

Nordahl, Christian, Veselka Boeva, Håkan Grahn, and Marie Persson Netz. 2022. “EvolveCluster: An Evolutionary Clustering Algorithm for Streaming Data.” Evolving Systems 13 (4): 603–23.

¡Thanks!

carmen.lancho@urjc.es

@DSLAB_URJC

https://www.datasciencelab.es

Questions?